Extending SPARQL Algebra to Support Efficient Evaluation of Top-K SPARQL Queries
نویسندگان
چکیده
With the widespread adoption of Linked Data, the efficient processing of SPARQL queries gains importance. A crucial category of queries that is prone to optimization is “top-k” queries, i.e. queries returning the top k results ordered by a specified ranking function. Top-k queries can be expressed in SPARQL by appending to a SELECT query the ORDER BY and LIMIT clauses, which impose a sorting order on the result set, and limit the number of results. However, the ORDER BY and LIMIT clauses in SPARQL algebra are result modifiers, i.e. their evaluation is performed only after the evaluation of the other query clauses. The evaluation of ORDER BY and LIMIT clauses in SPARQL engines typically requires the process of all the matching solutions (possibly thousands), followed by a monolithically computation of the ranking function for each solution, even if only a limited number (e.g. K = 10) of them were requested, thus leading to poor performance. In this paper, we present SPARQL-RANK, an extension of the SPARQL algebra and execution model that supports ranking as a first-class SPARQL construct. The new algebra and execution model allow for splitting the ranking function and interleaving it with other operations. We also provide a prototypal open source implementation of SPARQL-RANK based on ARQ, and we carry out a series of preliminary experiments.
منابع مشابه
Efficient Execution of Top-K SPARQL Queries
Top-k queries, i.e. queries returning the top k results ordered by a user-defined scoring function, are an important category of queries. Order is an important property of data that can be exploited to speed up query processing. State-of-the-art SPARQL engines underuse order, and top-k queries are mostly managed with a materialize-then-sort processing scheme that computes all the matching solut...
متن کاملEfficient evaluation of SPARQL over RDFS graphs
Evaluating queries over RDFS is currently done by computing the closure, which is quadratic in the worst case. In this paper, we show how to evaluate SPARQL over RDFS graphs by using a navigational language grounded in standard relational algebra with operators for navigating triples with regular constraints. In particular, we show that there is a fragment of this language with linear time data...
متن کاملSTREAK: An Efficient Engine for Processing Top-k SPARQL Queries with Spatial Filters
The importance of geo-spatial data in critical applications such as emergency response, transportation, agriculture etc., has prompted the adoption of recent GeoSPARQL standard in many RDF processing engines. In addition to large repositories of geo-spatial data –e.g., LinkedGeoData, OpenStreetMap, etc.– spatial data is also routinely found in automatically constructed knowledgebases such as Ya...
متن کاملPredicting SPARQL Query Performance
We address the problem of predicting SPARQL query performance. We use machine learning techniques to learn SPARQL query performance from previously executed queries. We show how to model SPARQL queries as feature vectors, and use k -nearest neighbors regression and Support Vector Machine with the nu-SVR kernel to accurately (R value of 0.98526) predict SPARQL query execution time. 1 Query Perfo...
متن کاملProcessing Heterogeneous RDF Events with Standing SPARQL Update Rules
SPARQL query language is targeted to search datasets encoded in RDF. SPARQL Update adds support of insert and delete operations between graph stores, enabling queries to process data in steps, have persistent memory and communicate with each other. When used in a system supporting incremental evaluation of multiple simultaneously active and collaborating queries SPARQL can define entire event p...
متن کامل